Add multi-database support to cluster mode #1671

xbasel · 2025-02-05T15:28:52Z

This commit introduces multi-database support in cluster mode while maintaining backward compatibility and requiring no API changes. Key features include:

Database-agnostic hashing: The hashing algorithm is unchanged. Identical keys map to the same slot across all databases. No changes to slot calculation. This ensures consistency in key distribution and maintains compatibility with existing single-database setups.
Implementation is fully backward compatible with no API changes.
The core structure remains an array of databases, each containing a list of hashtables (one per slot).
Multi-DB support in cluster mode affects slot migration—tools need to iterate over DBs.

Command-Level Changes

SELECT / MOVE / COPY are now supported in cluster mode.
MOVE / COPY are rejected during slot migration to prevent multi-DB inconsistencies.
Cluster management commands are global commands, except for GETKEYSINSLOT, COUNTKEYSINSLOT and MIGRATE, which run in selected-DB context.
SWAPDB remains disabled in cluster mode due to its non-atomic nature and potential inconsistencies across primaries.

Behavior Changes

Transaction Handling Changes (MULTI/EXEC)
getNodeByQuery key lookup behavior will be changed:
- No key lookups when queuing commands in MULTI, only cross-slot validation.
- Key lookups happen at EXEC time in the correct database.
- SELECT inside MULTI/EXEC is now checked, ensuring key validation uses the selected DB at lookup.
MIGRATE command operates a selected-db context. Please note that MIGRATE command parameter destination-db is used, when migrating keys they can be migrated to a different database in the target, like in non-cluster mode.

Slot migration process changes when multiple databases are used:

	Iterate through all databases
 		SELECT database
 		keys = GETKEYSINSLOT
 		MIGRATE source target keys

Valkey-cli has been updated to support resharding across all databases.

Implements #1319

This commit introduces multi-database support in cluster mode while maintaining backward compatibility and requiring no API changes. Key features include: - Database-agnostic hashing: The hashing algorithm is unchanged. Identical keys map to the same slot across all databases. No changes to slot calculation. This ensures consistency in key distribution and maintains compatibility with existing single-database setups. - Implementation is fully backward compatible with no API changes. - The core structure remains an array of databases, each containing a list of hashtables (one per slot). Cluster management commands are global commands, except for GETKEYSINSLOT and COUNTKEYSINSLOT, which run in selected-DB context. MIGRATE command operates a selected-db context. Please note that MIGRATE command parameter destination-db is used, when migrating keys they can be migrated to a different database in the target, like in non-cluster mode. Slot migration process changes when multiple databases are used: Iterate through all databases SELECT database keys = GETKEYSINSLOT MIGRATE source target keys Valkey-cli has been updated to support resharding across all databases. Signed-off-by: xbasel <[email protected]>

codecov · 2025-02-05T15:45:24Z

Codecov Report

Attention: Patch coverage is 82.60870% with 12 lines in your changes missing coverage. Please review.

Project coverage is 71.18%. Comparing base (2eac2cc) to head (362b659).
Report is 60 commits behind head on unstable.

Files with missing lines	Patch %	Lines
src/cluster.c	57.14%	9 Missing ⚠️
src/valkey-cli.c	92.59%	2 Missing ⚠️
src/cluster_legacy.c	94.44%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable    #1671      +/-   ##
============================================
+ Coverage     70.97%   71.18%   +0.20%     
============================================
  Files           121      123       +2     
  Lines         65238    65667     +429     
============================================
+ Hits          46305    46745     +440     
+ Misses        18933    18922      -11

Files with missing lines	Coverage Δ
src/config.c	`78.35% <ø> (-0.06%)`	⬇️
src/db.c	`89.91% <100.00%> (+0.34%)`	⬆️
src/server.h	`100.00% <ø> (ø)`
src/valkey-benchmark.c	`62.00% <ø> (+1.86%)`	⬆️
src/cluster_legacy.c	`86.11% <94.44%> (+0.20%)`	⬆️
src/valkey-cli.c	`56.36% <92.59%> (+0.48%)`	⬆️
src/cluster.c	`88.03% <57.14%> (-1.20%)`	⬇️

... and 39 files with indirect coverage changes

JoBeR007 · 2025-02-11T10:09:32Z

src/db.c

@@ -1728,12 +1714,6 @@ void swapMainDbWithTempDb(serverDb *tempDb) {
 void swapdbCommand(client *c) {
    int id1, id2;

-    /* Not allowed in cluster mode: we have just DB 0 there. */


Would that be enough for swapdb to work in cluster mode? What will happen in setup with 2 shards, each responsible for half of slots in db's?

With this implementation SWAPDB must be executed in all primary nodes. There are three options:

Allow SWAPDB and shift responsibility to the user – Risky, non-atomic, can cause temporary inconsistency and data corruption. Needs strong warnings.

Keep SWAPDB disabled in cluster mode – Safest, avoids inconsistency.

Make SWAPDB cluster-wide and atomic or – Complex, unclear feasibility.

I think option 2 is the safest bet. @JoBeR007 wdyt?

Is SWAPDB replicated as a single command? Then it's atomic.

If it's risky, it's risky in standslone mode with replicas too, right?

I think we can allow it. Swapping the data can only be done in some non-realtime workloads anyway I think.

I think risky because of replication and risky because of the need to execute SWAPDB on all primary nodes are unrelated just because as a user you can't control first, but user is the main risk in the second case.
I would keep SWAPDB disabled in cluster mode, if we decide to continue with this implementation

In cluster mode, consistency is per slot.

Yes, FLUSHDB is very similar in this regard. If a failover happens just before this command has been propagated to replicas, it's a big thing, but it's no surprise I think. The client can use WAIT or check replication offset to make sure the FLUSHDB or SWAPDB was successful on the replicas.

Regarding this, I think it is not just an issue of Multi-database but is more related to atomic slot migration. If a shard is in a stable state (not undergoing slot migration), then flushdb/flushall/swapdb are safe. However, if slot migration is in progress, it might lead to data inconsistency.

I think this needs to be considered alongside atomic-slot-migration:

During the ATM process, for slots being migrated, if we encounter flushall/flushdb, we can send a command like flushslot or flushslotall to the target shard

As for swapdb, I recommend temporarily prohibiting execution during the ATM process

@PingXie @enjoy-binbin , please also take note of this.

make sense. @murphyjacob4 FYI

I made a comment on the issue about this, but also worth mentioning it's hard to orchestrate SWAPDB. Even in steady state, flushdb and flushall are idempotent (you can send them multiple times) but swapdb isn't. If a command times out on one node, it's hard to reason about if it was successful and how to retry it. I think we should continue to disable SWAPDB in cluster mode for now, unless we introduce an idempotent way to do the swap.

Maybe introduce UUID tracking for SWAPDB requests works.
disabling SWAPDB for now.

Signed-off-by: zhaozhao.zz <[email protected]>

soloestoy · 2025-02-13T08:59:51Z

src/cluster.c

@@ -1102,7 +1110,7 @@ getNodeByQuery(client *c, struct serverCommand *cmd, robj **argv, int argc, int
             * NODE <node-id>. */
            int flags = LOOKUP_NOTOUCH | LOOKUP_NOSTATS | LOOKUP_NONOTIFY | LOOKUP_NOEXPIRE;
            if ((migrating_slot || importing_slot) && !pubsubshard_included) {
-                if (lookupKeyReadWithFlags(&server.db[0], thiskey, flags) == NULL)
+                if (lookupKeyReadWithFlags(c->db, thiskey, flags) == NULL)


Here, I modified it to use c->db, so for most commands, the key it wants to access can be correctly located. However, some cross-DB commands, such as COPY, still require additional checks. The ultimate solution is atomic-slot-migration I believe. Once ATM is implemented, the TRYAGAIN issue will no longer occur.

I noticed that getNodeByQuery doesn't follow selects either, so this might not be the right database. If you for example have:

SELECT 0 GET FOO SELECT 1 GET FOO

c->db won't be correct here either. COPY and move are also such problems as mentioned. I wonder if there is some way to make this correct without having ATM so we can limit the breakage if you're moving from standalone to cluster.

Generally, c->db can obtain the correct context information. Are you referring to the scenario where the select command is also used within a transaction (MULTI/EXEC)?

I've noticed that getNodeByQuery is being invoked while queuing commands in a MULTI context. Is this intentional? It seems unnecessary to check for key existence before execution, as the database state can change and keys might be migrated. I would expect this check to happen when EXEC is executed instead. Any thoughts?

The only benefit of this early validation, as I see it, is detecting cross-slot keys sooner. I think key existence validation should happen during EXEC execution.

@madolson / @soloestoy
Can you check da1ee65 ?

looks good, but COPY and MOVE still have the problem. A simple way is refuse these commands during slot migration, or we can wait atomic slot migration finished that we don't need to check the migrating status.

Yes, the commit above does not address COPY and MOVE.
I've merged a comment to reject these commands during slot migration.

soloestoy · 2025-02-13T09:07:19Z

I'm happy that we did "Unified db rehash method for both standalone and cluster #12848" when developing kvstore , which made the implementation of multi-database simpler.

Signed-off-by: xbasel <[email protected]>

hpatro

We need to add history to SWAPDB, SELECT, MOVE json files to indicate it's supported since 9.0.

src/cluster_legacy.c

tests/support/cluster.tcl

madolson · 2025-02-17T19:55:49Z

tests/cluster/tests/05-cluster-multidatabases.tcl

@@ -0,0 +1,481 @@
+# Tests multi-databases in cluster mode


This is the legacy clustering system. Ideally this test should be in unit/cluster

tests/cluster/tests/05-cluster-multidatabases.tcl

tests/unit/cluster/cli.tcl

madolson · 2025-02-17T20:01:32Z

src/cluster.c

@@ -1102,7 +1110,7 @@ getNodeByQuery(client *c, struct serverCommand *cmd, robj **argv, int argc, int
             * NODE <node-id>. */
            int flags = LOOKUP_NOTOUCH | LOOKUP_NOSTATS | LOOKUP_NONOTIFY | LOOKUP_NOEXPIRE;
            if ((migrating_slot || importing_slot) && !pubsubshard_included) {
-                if (lookupKeyReadWithFlags(&server.db[0], thiskey, flags) == NULL)
+                if (lookupKeyReadWithFlags(c->db, thiskey, flags) == NULL)


I noticed that getNodeByQuery doesn't follow selects either, so this might not be the right database. If you for example have:

SELECT 0 GET FOO SELECT 1 GET FOO

c->db won't be correct here either. COPY and move are also such problems as mentioned. I wonder if there is some way to make this correct without having ATM so we can limit the breakage if you're moving from standalone to cluster.

madolson · 2025-02-17T20:07:53Z

src/db.c

@@ -1728,12 +1714,6 @@ void swapMainDbWithTempDb(serverDb *tempDb) {
 void swapdbCommand(client *c) {
    int id1, id2;

-    /* Not allowed in cluster mode: we have just DB 0 there. */


I made a comment on the issue about this, but also worth mentioning it's hard to orchestrate SWAPDB. Even in steady state, flushdb and flushall are idempotent (you can send them multiple times) but swapdb isn't. If a command times out on one node, it's hard to reason about if it was successful and how to retry it. I think we should continue to disable SWAPDB in cluster mode for now, unless we introduce an idempotent way to do the swap.

madolson · 2025-02-17T20:11:09Z

tests/unit/lazyfree.tcl

@@ -1,5 +1,12 @@
 start_server {tags {"lazyfree"}} {
    test "UNLINK can reclaim memory in background" {
+
+        # The test framework invokes "flushall", replacing kvstores even if empty.  


I would rather we did a sync flushall then in the test framework, so we don't have these random waits all over the place.

The wait here is only to allow lazy free to complete and for used_memory to update. We don't need to sleep after using FLUSHALL in other tests.

Additionally, once #1609 is merged, it's unlikely that this sleep will be necessary

How are you guaranteeing we never need to wait for the FLUSHALL in other tests?

I don’t. If FLUSHALL is just wiping databases, there’s no need to wait. The wait here is only for observing memory impact. Why do you think we need to wait every time FLUSHALL is called?

I was thinking that other people might not be aware of this constraint, and might encounter similar issues to you where the memory is not behaving the way they expect. We actually recently made the change (FLUSHALL used to default to being sync until Valkey 8.0). So maybe we should be explicitly doing the FLUSHALL SYNC inside the test framework itself.

Also, now that I read this again, is this true? The cluster test framework invokes flushall, this test doesn't seem to invoke flushall at all. Is this still necessary?

… in cluster mode Previously, key lookup validation in cluster mode was performed both when queuing and executing commands in a `MULTI/EXEC` transaction. However, this was unnecessary because: 1. If we check for key existence when queuing, the keys might not exist anymore when `EXEC` runs. 2. The only check that matters at queuing time is cross-slot validation, since commands in a transaction must operate within the same slot. 3. Key lookups should only happen at `EXEC` time when the command actually runs. - Removed key lookup validation at queuing time, keeping only cross-slot validation. - Modified `getNodeByQuery` to detect `SELECT` when scanning `MULTI` commands and update the database pointer accordingly. - Now, key lookups are performed **only** at `EXEC` time, ensuring validation happens when the command actually executes. - **Before:** Key lookups were performed both when queuing and executing `MULTI/EXEC`, which was redundant and could lead to incorrect assumptions. - **Now:** Only cross-slot validation is done at queuing. Key lookups are performed at `EXEC`, ensuring accuracy and correctness. Signed-off-by: xbasel <[email protected]>

…stencies. Signed-off-by: xbasel <[email protected]>

src/db.c

Signed-off-by: xbasel <[email protected]>

hwware · 2025-02-26T19:03:38Z

src/cluster.c

        unsigned int numkeys = maxkeys > keys_in_slot ? keys_in_slot : maxkeys;
        addReplyArrayLen(c, numkeys);
        kvstoreHashtableIterator *kvs_di = NULL;
-        kvs_di = kvstoreGetHashtableIterator(server.db->keys, slot, 0);
+        kvs_di = kvstoreGetHashtableIterator(c->db->keys, slot, 0);


This change I thought is not compatible with current user expectation.
Now CLUSTER GETKEYSINSLOT and CLUSTER COUNTKEYSINSLOT only return the value of db0. The changes return the sum of all db.
I think the better way is to add more parameters (such as db number) on CLUSTER GETKEYSINSLOT and CLUSTER COUNTKEYSINSLOT to the specific db. And add one more cluster command to get all db keys.

Maybe need to discuss details

@hwware we are discussing two commands at #1319 (comment)

@hwware Worth noting that existing clients (working exclusively with DB0) would not be impacted. Since their clients already operate on DB0, both COUNTKEYSINSLOT and GETKEYSINSLOT will return the same results whether iterating over all databases or reading only from DB0.

Only clients that choose to use multiple databases may need to make adjustments

I also agree that these commands can be further enhanced by adding optional parameters. Please see my comment here:
#1319 (comment)

xbasel added 2 commits February 5, 2025 17:45

Reformatting

4d45a7e

Typo

a20c149

xbasel mentioned this pull request Feb 5, 2025

[NEW] Multiple DB supports in cluster mode #1319

Open

xbasel marked this pull request as draft February 6, 2025 10:01

xbasel mentioned this pull request Feb 6, 2025

[NEW] Multi-database support in cluster mode - Implementation Plan #1681

Closed

xbasel added 2 commits February 10, 2025 21:57

fix tests

1c3ea2f

fix test

0cf9b2d

xbasel marked this pull request as ready for review February 10, 2025 21:37

xbasel requested a review from zuiderkwast February 10, 2025 22:13

JoBeR007 reviewed Feb 11, 2025

View reviewed changes

soloestoy requested review from soloestoy and removed request for zuiderkwast February 12, 2025 06:28

soloestoy added 2 commits February 13, 2025 16:35

Correctly determine the DB of the key in getNodeByQuery

9fff9f1

Signed-off-by: zhaozhao.zz <[email protected]>

CLUSTER RESET should ensure primary node has no keys

0e29ea2

Signed-off-by: zhaozhao.zz <[email protected]>

soloestoy reviewed Feb 13, 2025

View reviewed changes

xbasel added 2 commits February 16, 2025 12:32

Add CLUSTER RESET test when the non-default db contain keys

64b4e9a

Signed-off-by: xbasel <[email protected]>

Expand multi-database tests in cluster mode to test SLOT-STATS

a670e9d

Signed-off-by: xbasel <[email protected]>

ranshid added the release-notes This issue should get a line item in the release notes label Feb 17, 2025

hpatro reviewed Feb 17, 2025

View reviewed changes

src/cluster_legacy.c Outdated Show resolved Hide resolved

tests/support/cluster.tcl Outdated Show resolved Hide resolved

madolson reviewed Feb 17, 2025

View reviewed changes

xbasel added 4 commits February 23, 2025 12:06

Refactor

d6e5c4d

Cleanup tests

98c4279

Remove redundant tags from tests

4d41f36

Disable SWAPDB in cluster mode

2149e37

ranshid added the client-changes-needed Client changes are required for this feature label Feb 24, 2025

xbasel added 2 commits February 26, 2025 00:00

Block COPY and MOVE during slot migration to prevent multi-DB inconsi…

8aaf8d7

…stencies. Signed-off-by: xbasel <[email protected]>

hwware reviewed Feb 26, 2025

View reviewed changes

src/db.c Outdated Show resolved Hide resolved

Use bool instead of int for dbHasNoKeys

362b659

Signed-off-by: xbasel <[email protected]>

hwware reviewed Feb 26, 2025

View reviewed changes

xbasel added the documentation Improvements or additions to documentation label Feb 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add multi-database support to cluster mode #1671

Add multi-database support to cluster mode #1671

xbasel commented Feb 5, 2025 •

edited

Loading

codecov bot commented Feb 5, 2025 •

edited

Loading

JoBeR007 Feb 11, 2025

xbasel Feb 11, 2025 •

edited

Loading

zuiderkwast Feb 12, 2025

JoBeR007 Feb 12, 2025

zuiderkwast Feb 12, 2025

zuiderkwast Feb 12, 2025

soloestoy Feb 13, 2025

PingXie Feb 17, 2025

madolson Feb 17, 2025

xbasel Feb 23, 2025

soloestoy Feb 13, 2025

madolson Feb 17, 2025

soloestoy Feb 18, 2025

xbasel Feb 23, 2025

xbasel Feb 23, 2025

xbasel Feb 25, 2025

soloestoy Feb 26, 2025

xbasel Feb 26, 2025

soloestoy commented Feb 13, 2025

hpatro left a comment

madolson Feb 17, 2025

madolson Feb 17, 2025

madolson Feb 17, 2025

madolson Feb 17, 2025

xbasel Feb 23, 2025

madolson Feb 25, 2025 •

edited

Loading

xbasel Feb 25, 2025

madolson Feb 27, 2025

hwware Feb 26, 2025 •

edited

Loading

PingXie Feb 27, 2025

xbasel Feb 27, 2025

Add multi-database support to cluster mode #1671

Are you sure you want to change the base?

Add multi-database support to cluster mode #1671

Conversation

xbasel commented Feb 5, 2025 • edited Loading

codecov bot commented Feb 5, 2025 • edited Loading

Codecov Report

Choose a reason for hiding this comment

xbasel Feb 11, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

soloestoy commented Feb 13, 2025

hpatro left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

madolson Feb 25, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hwware Feb 26, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xbasel commented Feb 5, 2025 •

edited

Loading

codecov bot commented Feb 5, 2025 •

edited

Loading

xbasel Feb 11, 2025 •

edited

Loading

madolson Feb 25, 2025 •

edited

Loading

hwware Feb 26, 2025 •

edited

Loading